Data Visualization

Authors

Dr. Muhammad Abdul Hafiz bin Kamarul Zaman

Dr. Tengku Muhammad Huzaifah bin Tengku Mokhtar

Dr. Muhammad Za’im bin Mohd Samsuri

Published

June 15, 2025

1 Team picture

2 INTRODUCTION

2.1 PURPOSE

The purpose of this assignment is to provide data visualization analysis of the Zoonotic Malaria infection cases in Pahang state for the year of 2011-2022. the data visualization will give some insights on the epidemiological profile of Zoonotic Malaria infected individual and later will helps in improving the control and prevention action of Malaria in Pahang specifically

3 Overview of the Dataset

The dataset comprises information from 888 patients across 11 district in Pahang from 2011 to 2022. This hierarchial dataset consist of 2 levels of patients factors and also the districts (level2). the variables consist of :

  • District (Daerah): Identifies where the infection happened (11 district).
  • Age (Umur): The age of the patient when diagnosed with the Zoonotic Malaria infection (in year).
  • Gender (Jantina): The gender of the patient who diagnosed with the Zoonotic Malaria infection.
  • Citizenship (Warganegara): Status of the infected patients whether he is Malaysian citizen (hold a legal document) or foreigner who works and live in Malaysia but didnot possess citizenship ID.
  • Forestry related work (Pekerjaan): The jobscope of the patients whether related to forestry or not
  • Parasite density (KepadatanParasit): Total number of Plasmodium parasite observed under the micrscope
  • Year (Year): The year when the patient was diagnosed with Zoonotic Malaria (from 2011 to 2022)
  • Duration (Duration_days): The duration from onset of symptoms to diagnosis. it might reflect delayed in diagnosis if more than 4 days.

4 INSTALLING PACKAGES AND LOADING LIBRARIES

library(tidyverse)
library(ggplot2)
library(gtsummary)
library(readxl)
library(broom)
library(DT)
library(summarytools)
library(patchwork)
library(GGally)
library(gganimate)
library(gifski)

4.1 READ DATASET

data1 <- read_excel("knowlesi.xlsx")
View(data1)

4.2 Data wrangling

data1<-data1 %>% mutate_if(is.character,~ as_factor(.))
data1$KepadatanParasit <- as.numeric(as.character(data1$KepadatanParasit))
glimpse(data1)
Rows: 888
Columns: 18
$ Daerah              <fct> BENTONG, LIPIS, JERANTUT, LIPIS, RAUB, LIPIS, MARA…
$ Umur                <dbl> 22, 71, 28, 36, 32, 24, 22, 16, 30, 50, 52, 25, 34…
$ Jantina             <fct> MALE, MALE, MALE, MALE, MALE, MALE, MALE, MALE, FE…
$ Hamil               <fct> NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO…
$ Bangsa              <fct> INDIA, CINA, MELAYU, MELAYU, MELAYU, MELAYU, MELAY…
$ Warganegara         <fct> YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA…
$ Pekerjaan           <fct> FOREST RELATED, NON FOREST RELATED, NON FOREST REL…
$ Kawasan             <fct> RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, R…
$ KepadatanParasit    <dbl> 1200, 64000, 1500, 4640, 3040, 1120, 1800, 20000, …
$ dateNotifikasi      <dttm> 2011-01-12, 2011-01-14, 2011-02-09, 2011-03-04, 2…
$ dateOnset           <dttm> 2011-01-07, 2011-01-05, 2011-01-30, 2011-02-25, 2…
$ CaraPengesananKes   <fct> PCD, PCD, PCD, PCD, PCD, PCD, PCD, PCD, PCD, PCD, …
$ KlasifikasiKes      <fct> INDIGENOUS, INDIGENOUS, INDIGENOUS, INDIGENOUS, IN…
$ G6PD                <fct> NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO…
$ PreliminaryDiagnose <fct> UNCOMPLICATED, UNCOMPLICATED, UNCOMPLICATED, UNCOM…
$ PernahHidap         <fct> TIDAK, TIDAK, TIDAK, TIDAK, TIDAK, TIDAK, TIDAK, T…
$ Yearx               <dttm> 2011-01-12, 2011-01-14, 2011-02-09, 2011-03-04, 2…
$ Year                <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20…

4.3 Construct new meaningful variables (time to diagnosis)

data1 <- data1 %>% mutate(dur = as.duration(dateOnset %--% dateNotifikasi))
data1 <- data1 %>%
  mutate(duration_days = as.integer(abs(as.numeric(dur)) / 86400))
summary(data1)
      Daerah         Umur         Jantina    Hamil            Bangsa   
 LIPIS   :455   Min.   : 0.00   MALE  :742   NO :884   MELAYU    :512  
 JERANTUT:121   1st Qu.:25.00   FEMALE:146   YES:  4   ORANG ASLI:127  
 MARAN   : 75   Median :34.00                          INDONESIA :105  
 RAUB    : 71   Mean   :37.02                          CINA      : 46  
 ROMPIN  : 42   3rd Qu.:48.00                          BANGLADESH: 37  
 TEMERLUH: 41   Max.   :85.00                          INDIA     : 16  
 (Other) : 83                                          (Other)   : 45  
 Warganegara              Pekerjaan    Kawasan    KepadatanParasit   
 YA   :716   FOREST RELATED    :459   RURAL:797   Min.   :1.600e+01  
 TIDAK:172   NON FOREST RELATED:429   URBAN: 91   1st Qu.:1.445e+03  
                                                  Median :5.000e+03  
                                                  Mean   :1.929e+07  
                                                  3rd Qu.:2.070e+04  
                                                  Max.   :4.598e+09  
                                                  NA's   :1          
 dateNotifikasi                     dateOnset                     
 Min.   :2011-01-12 00:00:00.00   Min.   :2011-01-05 00:00:00.00  
 1st Qu.:2013-10-06 12:00:00.00   1st Qu.:2013-10-02 18:00:00.00  
 Median :2017-08-02 12:00:00.00   Median :2017-07-30 12:00:00.00  
 Mean   :2017-02-12 11:16:12.97   Mean   :2017-02-06 04:25:56.75  
 3rd Qu.:2020-03-05 06:00:00.00   3rd Qu.:2020-02-22 06:00:00.00  
 Max.   :2022-12-30 00:00:00.00   Max.   :2022-12-25 00:00:00.00  
                                                                  
 CaraPengesananKes    KlasifikasiKes  G6PD        PreliminaryDiagnose
 PCD:845           INDIGENOUS:888    NO :881   UNCOMPLICATED:707     
 ACD: 41                             YES:  7   SEVERE       :181     
 MBS:  2                                                             
                                                                     
                                                                     
                                                                     
                                                                     
 PernahHidap     Yearx                             Year     
 TIDAK:866   Min.   :2011-01-12 00:00:00.00   Min.   :2011  
 YA   : 22   1st Qu.:2013-10-06 12:00:00.00   1st Qu.:2013  
             Median :2017-08-02 12:00:00.00   Median :2017  
             Mean   :2017-02-12 11:16:12.97   Mean   :2017  
             3rd Qu.:2020-03-05 06:00:00.00   3rd Qu.:2020  
             Max.   :2022-12-30 00:00:00.00   Max.   :2022  
                                                            
      dur                                 duration_days   
 Min.   :-2246400s (~-3.71 weeks)         Min.   : 0.000  
 1st Qu.:345600s (~4 days)                1st Qu.: 4.000  
 Median :518400s (~6 days)                Median : 6.000  
 Mean   :543016.216216216s (~6.28 days)   Mean   : 6.981  
 3rd Qu.:691200s (~1.14 weeks)            3rd Qu.: 8.000  
 Max.   :5788800s (~9.57 weeks)           Max.   :67.000  
                                                          

4.4 SELECT VARIABLES OF INTEREST

library(dplyr)

data2 <- data1 %>%
  dplyr::select(Daerah, Umur, Jantina, Hamil, Bangsa, Warganegara, Pekerjaan, Kawasan, KepadatanParasit, KlasifikasiKes, Year, duration_days)

summary(data2)
      Daerah         Umur         Jantina    Hamil            Bangsa   
 LIPIS   :455   Min.   : 0.00   MALE  :742   NO :884   MELAYU    :512  
 JERANTUT:121   1st Qu.:25.00   FEMALE:146   YES:  4   ORANG ASLI:127  
 MARAN   : 75   Median :34.00                          INDONESIA :105  
 RAUB    : 71   Mean   :37.02                          CINA      : 46  
 ROMPIN  : 42   3rd Qu.:48.00                          BANGLADESH: 37  
 TEMERLUH: 41   Max.   :85.00                          INDIA     : 16  
 (Other) : 83                                          (Other)   : 45  
 Warganegara              Pekerjaan    Kawasan    KepadatanParasit   
 YA   :716   FOREST RELATED    :459   RURAL:797   Min.   :1.600e+01  
 TIDAK:172   NON FOREST RELATED:429   URBAN: 91   1st Qu.:1.445e+03  
                                                  Median :5.000e+03  
                                                  Mean   :1.929e+07  
                                                  3rd Qu.:2.070e+04  
                                                  Max.   :4.598e+09  
                                                  NA's   :1          
    KlasifikasiKes      Year      duration_days   
 INDIGENOUS:888    Min.   :2011   Min.   : 0.000  
                   1st Qu.:2013   1st Qu.: 4.000  
                   Median :2017   Median : 6.000  
                   Mean   :2017   Mean   : 6.981  
                   3rd Qu.:2020   3rd Qu.: 8.000  
                   Max.   :2022   Max.   :67.000  
                                                  
glimpse(data2)
Rows: 888
Columns: 12
$ Daerah           <fct> BENTONG, LIPIS, JERANTUT, LIPIS, RAUB, LIPIS, MARAN, …
$ Umur             <dbl> 22, 71, 28, 36, 32, 24, 22, 16, 30, 50, 52, 25, 34, 5…
$ Jantina          <fct> MALE, MALE, MALE, MALE, MALE, MALE, MALE, MALE, FEMAL…
$ Hamil            <fct> NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, N…
$ Bangsa           <fct> INDIA, CINA, MELAYU, MELAYU, MELAYU, MELAYU, MELAYU, …
$ Warganegara      <fct> YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, Y…
$ Pekerjaan        <fct> FOREST RELATED, NON FOREST RELATED, NON FOREST RELATE…
$ Kawasan          <fct> RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, RURA…
$ KepadatanParasit <dbl> 1200, 64000, 1500, 4640, 3040, 1120, 1800, 20000, 568…
$ KlasifikasiKes   <fct> INDIGENOUS, INDIGENOUS, INDIGENOUS, INDIGENOUS, INDIG…
$ Year             <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
$ duration_days    <int> 5, 9, 10, 7, 7, 8, 9, 8, 3, 5, 3, 5, 3, 9, 12, 7, 7, …
view(data2)

5 DESCRIPTIVE TABLE

# Create the descriptive table
table_summary <- data2 %>%
  tbl_summary(
    by = Daerah,
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
  ) %>%
  add_overall() %>%
  modify_header(label ~ "**Variable**") %>%
  modify_spanning_header(
    all_stat_cols() ~ "**Summary Statistics**"
  ) %>%
  modify_caption("**Sociodemographic characteristic of  Zoonotic Malaria infected individual based on District**")

# Print the table
table_summary
Sociodemographic characteristic of Zoonotic Malaria infected individual based on District
Variable
Summary Statistics
Overall
N = 8881
BENTONG
N = 291
LIPIS
N = 4551
JERANTUT
N = 1211
RAUB
N = 711
MARAN
N = 751
ROMPIN
N = 421
TEMERLUH
N = 411
KUANTAN
N = 321
BERA
N = 171
C.HIGHLANDS
N = 21
PEKAN
N = 31
Umur 37 (16) 40 (14) 37 (17) 37 (14) 37 (15) 33 (16) 38 (17) 39 (17) 37 (14) 33 (10) 27 (3) 30 (9)
Jantina











    MALE 742 (84%) 25 (86%) 366 (80%) 107 (88%) 62 (87%) 67 (89%) 38 (90%) 30 (73%) 28 (88%) 14 (82%) 2 (100%) 3 (100%)
    FEMALE 146 (16%) 4 (14%) 89 (20%) 14 (12%) 9 (13%) 8 (11%) 4 (9.5%) 11 (27%) 4 (13%) 3 (18%) 0 (0%) 0 (0%)
Hamil 4 (0.5%) 0 (0%) 3 (0.7%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (2.4%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Bangsa











    INDIA 16 (1.8%) 2 (6.9%) 4 (0.9%) 2 (1.7%) 2 (2.8%) 0 (0%) 2 (4.8%) 3 (7.3%) 0 (0%) 1 (5.9%) 0 (0%) 0 (0%)
    CINA 46 (5.2%) 1 (3.4%) 15 (3.3%) 12 (9.9%) 5 (7.0%) 0 (0%) 8 (19%) 0 (0%) 4 (13%) 1 (5.9%) 0 (0%) 0 (0%)
    MELAYU 512 (58%) 14 (48%) 270 (59%) 71 (59%) 40 (56%) 50 (67%) 9 (21%) 30 (73%) 18 (56%) 9 (53%) 0 (0%) 1 (33%)
    ORANG ASLI 127 (14%) 4 (14%) 59 (13%) 9 (7.4%) 11 (15%) 13 (17%) 19 (45%) 5 (12%) 1 (3.1%) 3 (18%) 2 (100%) 1 (33%)
    INDONESIA 105 (12%) 6 (21%) 53 (12%) 18 (15%) 8 (11%) 7 (9.3%) 4 (9.5%) 3 (7.3%) 5 (16%) 1 (5.9%) 0 (0%) 0 (0%)
    NEPAL 9 (1.0%) 1 (3.4%) 4 (0.9%) 2 (1.7%) 0 (0%) 1 (1.3%) 0 (0%) 0 (0%) 0 (0%) 1 (5.9%) 0 (0%) 0 (0%)
    CAMBODIA 9 (1.0%) 0 (0%) 5 (1.1%) 1 (0.8%) 1 (1.4%) 1 (1.3%) 0 (0%) 0 (0%) 1 (3.1%) 0 (0%) 0 (0%) 0 (0%)
    MYANMAR 12 (1.4%) 0 (0%) 8 (1.8%) 2 (1.7%) 2 (2.8%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    BANGLADESH 37 (4.2%) 0 (0%) 28 (6.2%) 2 (1.7%) 2 (2.8%) 2 (2.7%) 0 (0%) 0 (0%) 1 (3.1%) 1 (5.9%) 0 (0%) 1 (33%)
    PAKISTAN 4 (0.5%) 1 (3.4%) 3 (0.7%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    THAILAND 1 (0.1%) 0 (0%) 1 (0.2%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    BUMIPUTRA SABAH 4 (0.5%) 0 (0%) 2 (0.4%) 2 (1.7%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    BUMIPUTRA SARAWAK 1 (0.1%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (3.1%) 0 (0%) 0 (0%) 0 (0%)
    LAOS 1 (0.1%) 0 (0%) 1 (0.2%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    CHINA 4 (0.5%) 0 (0%) 2 (0.4%) 0 (0%) 0 (0%) 1 (1.3%) 0 (0%) 0 (0%) 1 (3.1%) 0 (0%) 0 (0%) 0 (0%)
Warganegara











    YA 716 (81%) 22 (76%) 357 (78%) 97 (80%) 58 (82%) 64 (85%) 38 (90%) 38 (93%) 25 (78%) 13 (76%) 2 (100%) 2 (67%)
    TIDAK 172 (19%) 7 (24%) 98 (22%) 24 (20%) 13 (18%) 11 (15%) 4 (9.5%) 3 (7.3%) 7 (22%) 4 (24%) 0 (0%) 1 (33%)
Pekerjaan











    FOREST RELATED 459 (52%) 15 (52%) 236 (52%) 67 (55%) 40 (56%) 32 (43%) 29 (69%) 20 (49%) 8 (25%) 9 (53%) 1 (50%) 2 (67%)
    NON FOREST RELATED 429 (48%) 14 (48%) 219 (48%) 54 (45%) 31 (44%) 43 (57%) 13 (31%) 21 (51%) 24 (75%) 8 (47%) 1 (50%) 1 (33%)
Kawasan











    RURAL 797 (90%) 28 (97%) 413 (91%) 108 (89%) 58 (82%) 72 (96%) 37 (88%) 36 (88%) 25 (78%) 15 (88%) 2 (100%) 3 (100%)
    URBAN 91 (10%) 1 (3.4%) 42 (9.2%) 13 (11%) 13 (18%) 3 (4.0%) 5 (12%) 5 (12%) 7 (22%) 2 (12%) 0 (0%) 0 (0%)
KepadatanParasit 19,290,477 (234,170,084) 120,091,096 (636,247,620) 16,767,834 (225,522,199) 1,019,013 (4,545,677) 568,745 (3,527,452) 35,240,340 (302,971,560) 8,872,412 (37,670,508) 66,391,762 (417,791,045) 4,554,367 (15,629,632) 5,945 (8,555) 8,860,528 (12,519,535) 22,102 (20,337)
    Unknown 1 0 0 0 0 0 0 1 0 0 0 0
KlasifikasiKes











    INDIGENOUS 888 (100%) 29 (100%) 455 (100%) 121 (100%) 71 (100%) 75 (100%) 42 (100%) 41 (100%) 32 (100%) 17 (100%) 2 (100%) 3 (100%)
Year 2,016.6 (3.5) 2,017.4 (3.7) 2,016.4 (3.5) 2,015.8 (3.4) 2,017.7 (2.8) 2,016.7 (3.3) 2,017.5 (3.5) 2,015.4 (3.3) 2,018.7 (3.2) 2,019.0 (2.1) 2,018.0 (1.4) 2,020.7 (2.3)
duration_days 7 (5) 8 (3) 6 (5) 7 (7) 7 (4) 8 (6) 9 (5) 8 (4) 9 (5) 9 (5) 6 (1) 6 (1)
1 Mean (SD); n (%)

Comment:

The table provides a summary of the distribution of age, gender, citizenship, forestry related jobs, district profile , parasite density, and duration of onset to diagosis across 11 district in Pahang which includes 888 patients. The majority of patients is male (84%), malaysian citizen (81%), work in forestry related job (52%), came from rural area (90%). the mean age of patients was 37years old (SD=16), with the mean onset to diagnosis time was 7 days (SD = 5). The table shows that patients predominantly infected in Kuala Lipis (n=455).

6 DATA VISUALIZATION

6.1 LINE PLOT

The Line graph is used to visualize the distribution of the Zoonotic Malaria cases from 2011 to 2022, the visualization helps in identifying the trend of cases over the years.

geom_line Helps to visualize the overall trend or progression of malaria cases across time. the blue sets the line in blue colored and size=1 indicate the size of the line.

geom_point Highlights the exact values at each year, complementing the line for better clarity. the red color indicate the color of the point and size=2 is the size of the point.

# Define years and corresponding cases
Year <- 2011:2022
Cases <- c(36, 92, 118, 91, 34, 32, 74, 114, 69, 54, 100, 74)

# Create data frame
data_cases <- data.frame(Year, Cases)

# View the table
print(data_cases)
   Year Cases
1  2011    36
2  2012    92
3  2013   118
4  2014    91
5  2015    34
6  2016    32
7  2017    74
8  2018   114
9  2019    69
10 2020    54
11 2021   100
12 2022    74
ggplot(data_cases, aes(x = Year, y = Cases)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "red", size = 2) +
  labs(title = "Malaria Cases by Year",
       x = "Year", y = "Number of Cases") +
  theme_minimal()

The line graph above illustrates the annual number of malaria cases across a span of years from 2011 to 2022. The trend shown is notably fluctuating, indicating considerable year-to-year variation in case numbers. Peaks are observed in 2013 and 2018, where malaria cases surpassed 110, suggesting potential outbreak periods or lapses in control efforts. These are followed by sharp declines in 2015 and 2016, indicating improved management or natural downturns in transmission. A secondary rise occurs in 2021, though it is slightly lower than the previous peaks, followed again by a reduction in 2022.

6.2 BAR PLOT

The bar plot is used to visualize the distribution of Zoonotic Malaria cases based for each district across the 10 year period. Besides, additional boxplot were constructed to visualise the comparison between the number of cases based on the gender and working nature across the year and district.This visualization helps identify the group with higher number of cases which suggest for control and prevention activity to be focus on individual with this specific background.

The ggplot2 package was used to construct the bar plot, employing the ggplot() function to specify the dataset and aesthetic mappings. The aes() function mapped the Year and District variable to the x-axis and the interaction between gender and working nature to the fill aesthetic. To create side-by-side bars the geom_bar() function was utilized. The geom_hline was used to create a horizontal line, which can be to visualize the threshold level.For clarity, the plot was customized with titles and labels using the labs() function to add a title and labels for the x-axis, y-axis, and fill legend. The theme_minimal() function was applied to give the plot a clean and simple appearance, while the scale_fill_manual() function was used to manually set the colors for the different fill categories, ensuring the plot is visually appealing and easy to interpret.

cong_dat <- data2 %>%
  group_by(Year, Daerah) %>%
  summarise(Status = n())
cong_dat

6.2.1 Cases across the district from 2011 to 2022

cases_malaria <- ggplot(data2, aes(x = Daerah)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Cases across the district between 2011 to 2022",
       x = "District",
       y = "Count") +
  theme_minimal()

cases_malaria

the district of Lipis recorded the highest cummulative Zoonotic Malaria cases from 2011 to 2022, followed by Jerantut, Maran, Raub and others. the district with lowest case between 2011 to 2022 is the Cameron Highland district

6.2.2 Cases across the district for each year

ggplot(cong_dat, aes(x = Year, y = Status, fill = Daerah)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 10) +
  scale_fill_manual(values = c("blue", "red", "orange", "yellow", "pink", "purple", "green", "brown","lightgreen", "lightgrey", "chartreuse2"))

if comparing cases year by year basis, similar finding can be seen as Lipis recorded the highest number of cases every year, followed by Jerantut and others. The Jerantut district contributed a large portion in number of cases for early part of the cohort up until 2018. however for the last 3 years, other district , like Maran, Raub and kuantan has a comparable number of cases to Jerantut.

6.2.3 total number of cases between citizenship from 2011 to 2022

cong_dat2 <- data2 %>%
  group_by(Year, Warganegara) %>%
  summarise(Status = n())

ggplot(cong_dat2, aes(x = Year, y = Status, fill = Warganegara)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 30) +
  scale_fill_manual(values = c("blue", "red"))

when comparing between citizen and non-citizen cases, across the year, majority of the cases were among Malaysian citizen

6.2.4 comparison between gender across district and years

cong_dat3 <- data2 %>%
  group_by(Year, Jantina) %>%
  summarise(Status = n())

ggplot(cong_dat3, aes(x = Year, y = Status, fill = Jantina)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 30) +
  scale_fill_manual(values = c("purple", "grey"))

cong_dat4 <- data2 %>%
  group_by(Daerah, Jantina) %>%
  summarise(Status = n())

ggplot(cong_dat4, aes(x = Daerah, y = Status, fill = Jantina)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 30) +
  scale_fill_manual(values = c("purple", "grey"))

Majority of the cases were among Male. In Lipis district, the number of cases among female can be seen higher to other district which relatively having similar number of cases. female cases never reach 30 cases every year.

6.2.5 comparison between work nature of cases across district

cong_dat5 <- data2 %>%
  group_by(Daerah, Pekerjaan) %>%
  summarise(Status = n())

ggplot(cong_dat5, aes(x = Daerah, y = Status, fill = Pekerjaan)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 30) +
  scale_fill_manual(values = c("darkgreen", "darkred"))

there is not much different between number of cases when comparing the job nature of the patients across the district.

6.3 BOXPLOT

box_plot1 <- ggplot(data2, aes(x = Daerah, y = Umur, fill = Daerah)) +
  geom_boxplot() +
  labs(title = "Box Plot of age of zoonotic malaria cases between district from 2011 to 2022",
       x = "District",
       y = "Age") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3")

box_plot1

The figure is a boxplot which illustrate the distribution of age across the 11 districts.the median age for patients with Zoonotic Malaria for each districts were in between 20-40 which indicate young adult. the range of age were approximately similar between district except Bera, Cameron Highland and Pekan

6.4 SCATTER PLOT

scatter_plot1 <- ggplot(data2, aes(x = duration_days, y = KepadatanParasit, color = duration_days)) +
  geom_point(size = 3, alpha = 0.7) +
  labs(
    title = "Duration from Onset to Diagnosis vs. Parasite Count",
    x = "Duration (days)",
    y = "Parasite Count"
  ) +
  scale_color_gradient(low = "blue", high = "red") +
  theme_minimal()

scatter_plot1

The scatterplot illustrates the relationship between the duration from symptom onset to diagnosis (in days) and the parasite count among patients. Notably, the wide range in parasite counts, spanning several orders of magnitude, has resulted in a highly skewed distribution. This skewness hampers the visualization of cases with relatively low parasite counts, which appear compressed near the lower portion of the y-axis.

Moreover, the data points are predominantly clustered along the lower axis, reflecting a concentration of cases with low parasite density across varying durations. This pattern, combined with the absence of a discernible upward or downward trend, suggests a weak or negligible correlation between time to diagnosis and parasite load. The observed imbalance in data distribution reinforces the likelihood that delay in diagnosis is not a strong predictor of parasite burden in this cohort.

7 Recommendation

  1. control and prevention action should be given priority in the area with high burden of cases especially Kuala Lipis.

  2. Awareness on Zoonotic Malaria infection should be targeted to young male adult and local citizen of Malaysia irrespective of their working nature as data showed they are the most vulnerable group for the Zoonotic Malaria infection.

8 Animation

To make the visualisation different and interactive, the graph and plots can be transform into animation

knitr::include_graphics("malaria_cases.gif")

knitr::include_graphics("malaria_gender.gif")

9 References

  1. https://www.coursera.org/learn/jhu-advanced-data-visualization-r
  2. https://posit-connect.kk.usm.my/content/8f474ac1-9027-479e-bdf4-b6b8d6083bab/Data%20Visualization%20Assignment.html